AI Safety Gridworlds
نویسندگان
چکیده
We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries. To measure compliance with the intended safe behavior, we equip each environment with a performance function that is hidden from the agent. This allows us to categorize AI safety problems into robustness and specification problems, depending on whether the performance function corresponds to the observed reward function. We evaluate A2C and Rainbow, two recent deep reinforcement learning agents, on our environments and show that they are not able to solve them satisfactorily.
منابع مشابه
Gridworlds as Testbeds for Planning with Incomplete Information
Gridworlds are popular testbeds for planning with incomplete information but not much is known about their properties. We study a fundamental planning problem, localization, to investigate whether gridworlds make good testbeds for planning with incomplete information. We find empirically that greedy planning methods that interleave planning and plan execution can localize robots very quickly on...
متن کاملA Survay of Reinforcement Learning Methods in the Windy and Cliff-walking Gridworlds
This report details the implementation of three Reinforcment learning methods, Monte Carlo, SARSA, and Q-Learning, and compares their performances in the Windy and CliffWalking Gridworlds.
متن کاملEfficient Incremental Search for Moving Target Search
Incremental search algorithms reuse information from previous searches to speed up the current search and are thus often able to find shortest paths for series of similar search problems faster than by solving each search problem independently from scratch. However, they do poorly on moving target search problems, where both the start and goal cells change over time. In this paper, we thus deve...
متن کاملSubgoal Graphs for Eight-Neighbor Gridworlds
We propose a method for preprocessing an eightneighbor gridworld to generate a subgoal graph and a method for using this subgoal graph to find shortest paths faster than A*, by first finding high-level paths through subgoals and then shortest low-level paths between consecutive subgoals on the high-level path.
متن کاملRobust Computer Algebra, Theorem Proving, and Oracle AI
In the context of superintelligent AI systems, the term “oracle” has two meanings. One refers to modular systems queried for domain-specific tasks. Another usage, referring to a class of systems which may be useful for addressing the value alignment and AI control problems, is a superintelligent AI system that only answers questions. The aim of this manuscript is to survey contemporary research...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1711.09883 شماره
صفحات -
تاریخ انتشار 2017